AITopics | ai safety

Collaborating Authors

ai safety

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side.

MIT Technology ReviewMay-15-2026, 23:39:35 GMT

Musk v. Altman week 3: Elon Musk and Sam Altman traded blows over each other's credibility. Now the jury will pick a side. The trial spilled plenty of dirt--and raised more questions than answers about how the AI giant should be governed. In the final week of the trial, lawyers traded blows over Elon Musk's and OpenAI CEO Sam Altman's credibility. Altman was grilled on his alleged history of lying and self-dealing involving companies that do business with OpenAI. But he fired back, painting Musk as a power-seeker who wanted to control the development of artificial general intelligence (AGI)--powerful AI that can compete with humans on most cognitive tasks.

large language model, machine learning, natural language, (17 more...)

MIT Technology Review

Country: North America > United States > California (0.14)

Industry: Law > Litigation (0.36)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.64)

Add feedback

Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models

MIT Technology ReviewMay-1-2026, 22:08:19 GMT

Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI's models Musk kept his cool, and OpenAI's lawyer bulldozed him with piercing questions about his motivations for suing the company. In the first week of the landmark trial between Elon Musk and OpenAI, Musk took the stand in a crisp black suit and tie and argued that OpenAI CEO Sam Altman and president Greg Brockman had deceived him into bankrolling the company. Along the way, he warned that AI could destroy us all and sat through revelations that he had poached OpenAI employees for his own companies. He even confessed, to some audible gasps in the courtroom, that his own AI company, xAI, which makes the chatbot Grok, uses OpenAI's models to train its own. The federal courthouse in Oakland, California, was packed with armies of lawyers carrying boxes of exhibits, journalists typing away at their laptops, and a handful of concerned OpenAI employees. Outside, protesters lined the streets, carrying signs urging people to quit ChatGPT, boycott Tesla, or both.

large language model, machine learning, natural language, (17 more...)

MIT Technology Review

Country: North America > United States > California > Alameda County > Oakland (0.25)

Industry:

Law > Litigation (0.35)
Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

Neural Information Processing SystemsDec-25-2025, 02:02:13 GMT

AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of moral exception question answering (MoralExceptQA) of cases that involve potentially permissible moral exceptions - inspired by recent moral psychology studies. Using a state-of-the-art large language model (LLM) as a basis, we propose a novel moral chain of thought (MoralCoT) prompting strategy that combines the strengths of LLMs with theories of moral reasoning developed in cognitive science to predict human moral judgments. MoralCoT outperforms seven existing LLMs by 6.2% F1, suggesting that modeling human reasoning might be necessary to capture the flexibility of the human moral mind. We also conduct a detailed error analysis to suggest directions for future work to improve AI safety using MoralExceptQA.

exploring language model, make exception, name change, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The AI doomers feel undeterred

MIT Technology ReviewDec-15-2025, 10:00:00 GMT

But they certainly wish people were still taking their warnings really seriously. It's a weird time to be an AI doomer. This small but influential community of researchers, scientists, and policy experts believes, in the simplest terms, that AI could get so good it could be bad--very, very bad--for humanity. Though many of these people would be more likely to describe themselves as advocates for AI safety than as literal doomsayers, they warn that AI poses an existential risk to humanity. They argue that absent more regulation, the industry could hurtle toward systems it can't control. They commonly expect such systems to follow the creation of artificial general intelligence (AGI), a slippery concept generally understood as technology that can do whatever humans can do, and better. Though this is far from a universally shared perspective in the AI field, the doomer crowd has had some notable success over the past several years: helping shape AI policy coming from the Biden administration, organizing prominent calls for international "red lines " to prevent AI risks, and getting a bigger (and more influential) megaphone as some of its adherents win science's most prestigious awards. But a number of developments over the past six months have put them on the back foot.

agi, ai safety, gpt-5, (15 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Personal > Honors > Award (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
(2 more...)

Add feedback

Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research

Roytburg, Dani, Miller, Beck

arXiv.org Artificial IntelligenceDec-12-2025

While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment has diverged along two largely parallel tracks: safety--centered on scaled intelligence, deceptive or scheming behaviors, and existential risk--and ethics--focused on present harms, the reproduction of social bias, and flaws in production pipelines. Although both communities warn of insufficient investment in alignment, they disagree on what alignment means or ought to mean. As a result, their efforts have evolved in relative isolation, shaped by distinct methodologies, institutional homes, and disciplinary genealogies. We present a large-scale, quantitative study showing the structural split between AI safety and AI ethics. Using a bibliometric and co-authorship network analysis of 6,442 papers from twelve major ML and NLP conferences (2020-2025), we find that over 80% of collaborations occur within either the safety or ethics communities, and cross-field connectivity is highly concentrated: roughly 5% of papers account for more than 85% of bridging links. Removing a small number of these brokers sharply increases segregation, indicating that cross-disciplinary exchange depends on a handful of actors rather than broad, distributed collaboration. These results show that the safety-ethics divide is not only conceptual but institutional, with implications for research agendas, policy, and venues. We argue that integrating technical safety work with normative ethics--via shared benchmarks, cross-institutional venues, and mixed-method methodologies--is essential for building AI systems that are both robust and just.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.10058

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

King handed Nvidia boss a letter warning of AI dangers

BBC NewsNov-5-2025, 23:40:47 GMT

Jensen Huang, the head of the world's most valuable company Nvidia, says King Charles III personally handed him a copy of a speech he delivered in 2023 that included a warning about the dangers of artificial intelligence. He said, there's something I want to talk to you about. And he handed me a letter, Huang told the BBC, speaking after receiving the 2025 Queen Elizabeth Prize for Engineering in a ceremony at St James's Palace. The letter was a copy of the speech delivered by the King in 2023 at the world's first AI Summit, held at Bletchley Park . In it the monarch said that the risks of AI needed to be tackled with a sense of urgency, unity and collective strength.

huang, king, letter warning, (10 more...)

BBC News

Country:

North America > United States (0.50)
Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.25)
South America (0.16)
(15 more...)

Industry:

Information Technology (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Character.ai to ban teens from talking to its AI chatbots

BBC NewsOct-29-2025, 13:36:21 GMT

Character.ai to ban teens from talking to its AI chatbots The platform, founded in 2021, is used by millions to talk to chatbots powered by artificial intelligence (AI). But it is facing several lawsuits in the US from parents, including one over the death of a teenager, with some branding it a clear and present danger to young people. Online safety campaigners have welcomed the move but said the feature should never have been available to children in the first place. Character.ai said it was making the changes after reports and feedback from regulators, safety experts, and parents, which have highlighted concerns about its chatbots' interactions with teens. Experts have previously warned the potential for AI chatbots to make things up, be overly-encouraging, and feign empathy can pose risks to young and vulnerable people.

ai chatbot, chatbot, platform, (14 more...)

BBC News

Country:

North America > United States (0.35)
South America (0.15)
North America > Central America (0.15)
(13 more...)

Industry: Law > Litigation (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?

Dung, Leonard, Mai, Florian

arXiv.org Artificial IntelligenceOct-14-2025

AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligible chance that the technique fails to provide safety. As a strategy for risk mitigation, the AI safety community has increasingly adopted a defense-in-depth framework: Conceding that there is no single technique which guarantees safety, defense-in-depth consists in having multiple redundant protections against safety failure, such that safety can be maintained even if some protections fail. However, the success of defense-in-depth depends on how (un)correlated failure modes are across alignment techniques. For example, if all techniques had the exact same failure modes, the defense-in-depth approach would provide no additional protection at all. In this paper, we analyze 7 representative alignment techniques and 7 failure modes to understand the extent to which they overlap. We then discuss our results' implications for understanding the current level of risk and how to prioritize AI alignment research in the future.

failure mode, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.11235

Genre:

Overview (0.68)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Toward an African Agenda for AI Safety

Segun, Samuel T., Adams, Rachel, Florido, Ana, Timcke, Scott, Shock, Jonathan, Junck, Leah, Adeleke, Fola, Grossman, Nicolas, Alayande, Ayantola, Kponyo, Jerry John, Smith, Matthew, Fosu, Dickson Marfo, Tetteh, Prince Dawson, Arthur, Juliet, Kasaon, Stephanie, Ayodele, Odilile, Badolo, Laetitia, Plantinga, Paul, Gastrow, Michael, Adan, Sumaya Nur, Wiaterek, Joanna, Abungu, Cecil, Apeagyei, Kojo, Eder, Luise, Bissyande, Tegawende

arXiv.org Artificial IntelligenceAug-20-2025

This paper maps Africa's distinctive AI risk profile, from deepfake fuelled electoral interference and data colonial dependency to compute scarcity, labour disruption and disproportionate exposure to climate driven environmental costs. While major benefits are promised to accrue, the availability, development and adoption of AI also mean that African people and countries face particular AI safety risks, from large scale labour market disruptions to the nefarious use of AI to manipulate public opinion. To date, African perspectives have not been meaningfully integrated into global debates and processes regarding AI safety, leaving African stakeholders with limited influence over the emerging global AI safety governance agenda. While there are Computer Incident Response Teams on the continent, none hosts a dedicated AI Safety Institute or office. We propose a five-point action plan centred on (i) a policy approach that foregrounds the protection of the human rights of those most vulnerable to experiencing the harmful socio-economic effects of AI; (ii) the establishment of an African AI Safety Institute; (iii) promote public AI literacy and awareness; (iv) development of early warning system with inclusive benchmark suites for 25+ African languages; and (v) an annual AU-level AI Safety & Security Forum.

african agenda, ai safety, artificial intelligence

arXiv.org Artificial Intelligence

2508.13179

Country: Africa (0.24)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

A Framework for Inherently Safer AGI through Language-Mediated Active Inference

Wen, Bo

arXiv.org Artificial IntelligenceAug-11-2025

This paper proposes a novel framework for developing safe Artificial General Intelligence (AGI) by combining Active Inference principles with Large Language Models (LLMs). We argue that traditional approaches to AI safety, focused on post-hoc interpretability and reward engineering, have fundamental limitations. We present an architecture where safety guarantees are integrated into the system's core design through transparent belief representations and hierarchical value alignment. Our framework leverages natural language as a medium for representing and manipulating beliefs, enabling direct human oversight while maintaining computational tractability. The architecture implements a multi-agent system where agents self-organize according to Active Inference principles, with preferences and safety constraints flowing through hierarchical Markov blankets. We outline specific mechanisms for ensuring safety, including: (1) explicit separation of beliefs and preferences in natural language, (2) bounded rationality through resource-aware free energy minimization, and (3) compositional safety through modular agent structures. The paper concludes with a research agenda centered on the Abstraction and Reasoning Corpus (ARC) benchmark, proposing experiments to validate our framework's safety properties. Our approach offers a path toward AGI development that is inherently safer, rather than retrofitted with safety measures.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.05766

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback